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METHOD OF DETECTING PROTEIN-PROTEIN INTERACTIONS 


Related Applications 
This application claims priority from U.S. Provisional Application No. 
5 60/259,759 filed on January 4, 2001, which is incorporated herein by reference in its 
entirety. 


Field of the Invention 
The present invention generally relates to methods for detecting protein-protein 
10 interactions, and particularly to an in vitro two-hybrid systems for detecting protein- 
protein interactions. 


Background of the Invention 
There has been much interest in protein-protein interactions in the field of 

15 proteomics. A number of biochemical approaches have been used to identify interacting 
proteins. These approaches generally employ the affinities between interacting proteins 
to isolate proteins in a bound state. Examples of such methods include 
coimmunoprecipitation and copurification, optionally combined with cross-linking to 
stabilize the binding. Identities of the isolated protein interacting partners can be 

20 characterized by, e.g., mass spectrometry. See e.g., Rout et al, J. Cell Biol, 148:635- 
651 (2000); Houry et al, Nature, 402: 147-154 (1999); Winter et al, Cum Biol, 7:517- 
529 (1997). A popular approach useful in large-scale screening is the phage display 
method, in which filamentous bacteriophage particles are made by recombinant DNA 
technologies to express a peptide or protein of interest fused to a capsid or coat protein of 
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the bacteriophage. A whole library of peptides or proteins of interest can be expressed 
and a bait protein can be used to screening the library to identify peptides or proteins 
capable of binding to the bait protein. See e.g., U.S. Patent Nos. 5,223,409; 5,403,484; 
5,571,698; and 5,837,500. Notably, the phage display method only identifies those 
5 proteins capable of interacting in an in vitro environment, while the 

coimmunoprecipitation and copurification methods are not amenable to high throughput 
screening. 

The yeast two-hybrid system is a genetic method that overcomes certain 
shortcomings of the above approaches. The yeast two-hybrid system has proven to be a 

10 powerful method for the discovery of specific protein interactions in vivo. See generally, 
Bartel and Fields, eds., The Yeast Two-Hybrid System, Oxford University Press, New 
York, NY, 1997. The yeast two-hybrid technique is based on the fact that the DNA- 
binding domain and the transcriptional activation domain of a transcriptional activator 
contained in different fusion proteins can still activate gene transcription when they are 

15 brought into proximity to each other. As shown in Figure 1, in a yeast two-hybrid 

system, two fusion proteins are expressed in yeast cells. One has a DNA-binding domain 
of a transcriptional activator fused to a test protein. The other, on the other hand, 
includes a transcriptional activating domain of the transcriptional activator fused to 
another test protein. If the two test proteins interact with each other in vivo, the two 

20 domains of the transcriptional activator are brought together reconstituting the 

transcriptional activator and activating a reporter gene controlled by the transcriptional 
activator. See, e.g., U.S. Patent No. 5,283,173. 

Because of its simplicity, efficiency and reliability, the yeast two-hybrid system 
has gained tremendous popularity in many areas of research. Numerous protein-protein 

25 interactions have been identified using the yeast two-hybrid system. The identified 

proteins have contributed significantly to the understanding of many signal transduction 
pathways and other biological processes. For example, the yeast two-hybrid system has 
been successfully employed in identifying a large number of novel cell cycle regulators 
that are important in complex cell cycle regulations. Using known proteins that are 

30 important in cell cycle regulation as baits, other proteins involved in cell cycle control 
were identified by virtue of their ability to interact with the baits. See generally, Hannon 
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et aL, in The Yeast Two-Hybrid System, Bartel and Fields, eds., pages 183-196, Oxford 
University Press, New York, NY, 1997. 

The classic yeast two-hybrid system depends on gene activation in yeast 
nucleus and has generally required that specific protein-protein interactions between 
5 fusion proteins occur within the nucleus of yeast cells. Thus, although the conventional 
yeast two-hybrid system has been used successfully in the discovery of numerous protein 
interactions, its usefulness may be limited. 

Summary of the Invention 

10 This invention provides a versatile and sensitive in vitro assay system for 

detecting protein-protein interactions and for selecting compounds capable of modulating 
protein-protein interactions. Particularly, the present invention utilizes the so-called 
inteins, which are peptide sequences capable of directing protein trans -splicing in vitro. 
An intein is an intervening protein sequence in a protein precursor that is excised from 

15 the protein precursor during protein splicing. Protein splicing results in the concomitant 
ligation of the flanking protein fragments, i.e., the exteins, with a native peptide bond, 
thus forming a mature extein protein and the free intein. It is now known that inteins 
incorporated into non-native precursors can also cause protein-splicing and excision of 
the inteins. In addition, an N-terminal intein fragment in a fusion protein and a C- 

20 terminal intein fragment in another fusion protein, when brought into contact with each 
other, can bring about trans-splicing between the two fusion proteins. Thus, in 
accordance with the present invention, two hybrid fusion constructs are provided. One 
has a first test agent and an N-terminal intein fragment or N-intein, and the other has a 
second test agent and a C-terminal intein fragment or C-intein. In addition, one or both 

25 fusion constructs may have a reporter that undergoes detectable changes upon trans- 
splicing of the fusion constructs. If the first and second test agents interact with each 
other, thus bringing the N-intein and C-intein to close proximity, protein trans-splicing 
takes place. As a result, the fusion constructs are spliced, causing detectable changes in 
the reporter. Thus, by detecting the changes in the reporter, interactions between two test 

30 agents can be determined. 
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Intein-based trans-splicing can take place in vitro in a cell free environment. 
Therefore, the assay system of the present invention can be used for convenient and 
speedy in vitro analysis of protein-protein interactions. Particularly, the system is 
uniquely suited for protein array-based high-throughput in vitro screening of protein- 
5 protein interactions. Such an array-based assay combines extremely high efficiency with 
the system's other advantageous features and is therefore extremely powerful and 
versatile. 

Additionally, protein trans-splicing mediated by the N-intein and C-intein is 
independent of other cellular factors and does not require the action of additional proteins 

10 such as proteases. This makes the assay system of the present invention more reliable 
and easier to perform as compared to the assay methods known in the art for detecting 
protein-protein interactions. 

Another distinct feature of the intein-based assay is that the detection of protein- 
protein interaction is based on the occurrence of protein trans-splicing events, which 

15 typically are associated with protein cleavage and result in new protein structures and 
functions. Thus, the intein-based assay is well-suited to exploit the numerous direct and 
indirect methods available in the art for detecting changes in protein structures and 
functions. Because the intein-based assay can accommodate these numerous detection 
methods, there is great flexibility in choosing methods that are optimal for a particular 

20 condition. 

In addition, certain interacting proteins or fusion proteins are inherently toxic to 
cells, and therefore present problems for in vivo two-hybrid systems. The intein-based in 
vitro two-hybrid system is especially suited for such proteins. 

Moreover, the traditional two-hybrid systems have been largely ineffective in 

25 detecting protein-protein interactions between membrane proteins and extracellular 
proteins due to the requirement that the interaction of interest must take place in cell 
nucleus or cytosol. In contrast, the protein-protein interactions in the intein-based in vitro 
two-hybrid system of this invention are detected in vitro. Thus, it is particularly useful in 
studying interactions that demand a non-cellular environment. 

30 Similarly, the in vitro system is especially suitable where non-protein elements 

that cannot be synthesized by recombinant DNA technologies are involved. For example, 
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the system can be used to study interactions between non-protein agents. In addition, 
non-protein reporters which typically are not useful in in vivo systems can also be 
utilized. Indeed, in many such cases in vitro assays may be the only feasible two-hybrid 
technologies. 

5 The system of the present invention can also be used to select compounds capable 

of modulating protein-protein interactions. Although the traditional in vivo two-hybrid 
systems have been employed to identify such compounds, their usefulness is limited for 
several reasons. For example, the traditional in vivo two-hybrid systems are not 
amenable to identifying active compounds that are toxic to the host cell. Nor are they 

10 applicable to compounds that are unable to cross the host cell membrane or that are 

rapidly transported out of the host cell. In addition, resident cellular proteins other than 
the interacting proteins of interest can obscure effects of certain compounds by binding 
the compounds. In contrast, the intein-based in vitro system of the present invention is 
not associated such limitations inherent with the in vivo systems. 

15 Accordingly, in accordance with a first aspect of the present invention, a method 

for detecting protein-protein interaction in vitro is provided. Briefly, two fusion proteins 
are prepared and allowed to interact with each other. One of the two fusion proteins 
includes an N-intein and a first test polypeptide, and the other fusion protein includes a 
C-intein and a second test polypeptide. One or both of the two fusion proteins have an 

20 inactive reporter capable of being converted to an active reporter upon trans-splicing 
through the N-intein and the C-intein. The change in the active reporter level is 
determined. An increase in the amount of the active reporter would indicate that the first 
and second test polypeptides interact with each other through, e.g., binding affinity, to 
result in the trans-splicing of the two fusion proteins mediated by the N-intein and the C- 

25 intein. Preferably, the N-intein and C-intein are not associated with each other and do not 
exhibit any significant binding affinity to each other. Nor do they associate with or bind 
to the inactive reporter or test polypeptides in the fusion proteins. 

In one embodiment, the inactive reporter can be a polypeptide linked to one of the 
fusion proteins, and is cleaved off into a free form from the fusion protein upon protein 

30 trans-splicing. The reporter polypeptide can be selected and the fusion proteins can be 
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designed such that the precursor form of the polypeptide is inactive while the free 
reporter released from the fusion protein is active, i.e., is detectable directly or indirectly. 

In another embodiment, one of the two fusion proteins has a nonfunctional 
portion of a reporter polypeptide linked to the N-terminus of the N-intein. The other 
5 fusion protein comprises a distinct but similarly nonfunctional portion of the same 
reporter polypeptide linked to the C-terminus of the C-intein. Upon trans-splicing 
between the two fusion proteins through the N- and C-inteins, the two inactive reporter 
polypeptides are ligated together with a peptide bond, thereby forming an active reporter 
protein, which is detectable directly or indirectly. 

10 The assay is conducted in vitro by mixing together the two fusion proteins under 

conditions suitable for protein interactions and for protein trans-splicing. Alternatively, 
the fusion proteins can be recombinantly expressed separately in different host cells, and 
cell lysates or crude extracts prepared from the cells can be mixed to allow protein- 
protein interaction. The active reporter protein is then detected. 

15 In addition, the assay can also be conducted in the presence of a third polypeptide. 

In this manner, the interaction between the first and second test polypeptides can be 
detected if the interaction requires the presence of the third polypeptide. The third 
polypeptide may be a protein having affinity to either the first or second test polypeptides 
or both. Alternatively, the third polypeptide can modify one or both test polypeptides, 

20 e.g., by phosphorylation, glycosylation, and the like. 

The techniques used for monitoring the occurrence of protein trans-splicing 
events and detecting an active reporter will depend on the inactive reporter used and the 
active reporter derived therefrom. The system of the present invention can be designed 
such that an active reporter can be detected based on changes in protein sizes or other 

25 properties, or activation of certain protein functions. 

In accordance with a second aspect of the present invention, the above-described 
assay system is employed to determine whether a compound is capable of modulating an 
interaction between a first polypeptide and a second polypeptide. Essentially, two fusion 
proteins as described above are provided except that the first and second polypeptides are 

30 known to interact with each other. The interaction between the two fusion proteins in the 
presence of the test compound is determined. 
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The foregoing and other advantages and features of the invention, and the manner 
in which the same are accomplished, will become more readily apparent upon 
consideration of the following detailed description of the invention taken in conjunction 
with the accompanying examples and drawings, which illustrate preferred or exemplary 
5 embodiments. 

Brief Description of the Drawings 
Figure 1 is an illustration of the classic yeast two-hybrid system known in the art; 
Figure 2A illustrates a genetic selection process for selecting N-inteins and C- 
10 inteins that do not interact with each other; 

Figure 2B shows a process for verifying that the selected non-interacting N-intein 
G and C-intein are capable of mediating protein trans-splicing; 

O Figures 3A-3F are diagrams illustrating the fusion constructs in different 

embodiments of the present invention; 
15 Figure 4 is a diagram illustrating an embodiment of the present invention in which 

p a modifying enzyme is expressed in a multi-hybrid system and interaction between the 

^ modified proteins is detected. 

r[j Detailed Description of the Invention 

20 The term "compound" as used herein encompasses all types of organic or 

inorganic molecules, including but not limited to proteins, peptides, polysaccharides, 
lipids, nucleic acids, small organic molecules, inorganic compounds, and derivatives 
thereof. 

As used herein, the terms "polypeptide," "protein," and "peptide" are used 
25 interchangeably to refer to amino acid chains in which the amino acid residues are linked 
by covalent peptide bonds. The amino acid chains can be of any length of at least two 
amino acids, including full-length proteins. Unless otherwise specified, the terms 
"polypeptide," "protein," and "peptide" also encompass various modified forms thereof, 
including but not limited to glycosylated forms, phosphorylated forms, etc. 
30 The term "test agent" means a chemical compound, preferably an organic 

compound, to be tested in the present invention to determine its ability to interact with 
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another chemical compound. Test agents may include various forms of organic 
compounds, or combinations or conjugates thereof. In one Embodiment, the test agents 
preferably are polypeptides, in which case the test agents are termed "test polypeptides" 
or "test proteins." 

5 The term "fusion construct" refers to a non-naturally occurring hybrid or chimeric 

construct having two or more distinct portions covalently linked together, each portion 
being or being derived from a specific molecule. When two or more portions in a fusion 
construct as defined above are polypeptides and are linked together by peptide bonds, the 
fusion construct is conveniently referred to as "fusion protein." 
10 As used herein, the term "interacting" or "interaction" means that two domains or 

^ independent entities exhibit sufficient physical affinity to each other so as to bring the 

Q two "interacting" domains or entities physically close to each other. An extreme case of 

interaction is the formation of a chemical bond that results in continual, stable proximity 
^ of the two domains. Interactions that are based solely on physical affinities, although 

y3 15 usually more dynamic than chemically bonded interactions, can be equally effective at 
q co-localizing independent entities. Examples of physical affinities and chemical bonds 

Jf include but are not limited to, forces caused by electrical charge differences, 

J: hydrophobicity, hydrogen bonds, van der Wals force, ionic force, covalent linkages, and 

ST; combinations thereof. The state of proximity between the interacting domains or entities 

20 may be transient or permanent, reversible or irreversible. In any event, it is in contrast to 
and distinguishable from contact caused by natural random movement of two entities. 
Typically although not necessarily, an "interaction" is exhibited by the binding between 
the interacting domains or entities. Examples of interactions include specific interactions 
between antigen and antibody, ligand and receptor, and the like. 
25 An "interaction" between two protein domains, fragments or complete proteins 

can be determined by a number of methods other than the system of the present 
invention. For example, an interaction can be determined by functional assays such as 
the two-hybrid systems. Protein-protein interactions can also be determined by various 
biophysical and biochemical approaches based on the affinity binding between the two 
30 interacting partners. Such biochemical methods generally known in the art include, but 
are not limited to, protein affinity chromatography, affinity blotting, 
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immunoprecipitation, and the like. The binding constant for two interacting proteins, 
which reflects the strength or quality of the interaction, can also be determined using 
methods known in the art. See Phizicky and Fields, Microbiol Rev., 59:94-123 (1995). 
As used in the present disclosure, the term "reporter" means a molecule or a 

5 moiety or domain thereof that can be used as a marker for the determination of the 

occurrence of protein trans-splicing. An "inactive reporter" is a form of the reporter that 
is not detectable by a particular detection means, while an "active reporter" is a form of 
the reporter that is detectable by that detection means. It should be recognized that the 
terms "detectable" and "not detectable" are used herein in a relative sense. In essence, 

10 there should be a measurable or detectable change in the reporter, either quantitative or 
qualitative, upon intein-based trans-splicing. For purposes of the present discussion, 
"active reporters" include both reporters that are directly detectable and those reporters 
that are detectable indirectly using a predetermined technique. 

Many reporters are known in the art and the selection and application of any of 

15 those reporters to the present invention should be apparent to a skilled artisan apprised of 
the present disclosure. Examples include, but are not limited to: P-galactosidase (P-Gal) 
encoded by the LacZ gene which converts white X-Gal into a product with a blue color; 
green fluorescent protein (GFP), which can be sorted by flow-activated cell sorting 
(FACS). See Cubitt et al, Trends Biochem. Set, 20:448-455 (1995). 

20 Typically, an inactive reporter can be converted to an active reporter upon trans- 

splicing in the method of this invention. For example, a molecule when fused to a 
construct of the present invention may not be detectable and thus is referred to as "an 
inactive reporter." The fused form may be released from the fusion construct into a free 
form of the molecule that is detectable. This detectable free form is referred to as an 

25 "active reporter," which is in contrast to the "inactive" undetectable bound form of the 
reporter. In another example, two inactive reporters are fused to an N-intein and a C- 
intein, respectively, and upon trans-splicing, the two inactive reporters are ligated 
together forming a detectable active reporter. For this purpose, fragments of an active 
reporter that are not detectable can also be referred to "inactive reporter." Thus, an N- 

30 terminal fragment of a reporter protein is fused to an N-intein and a C-terminal fragment 
of the reporter protein is fused to a C-intein. Upon protein trans-splicing mediated by the 
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N- and C-intein, the N-terminal and C-terminal fragments can be ligated, thereby forming 
a full-length detectable active reporter protein. 

As is known in art, inteins are intervening protein sequences in protein precursors 
which are exercised out, or removed, from the protein precursors during protein splicing. 
5 The protein sequences flanking inteins are called exteins. The excision of an intein is 
associated with the concomitant ligation of the N-extein (the protein sequence to the N- 
terminus of the intein) and the C-extein (the protein sequence to the C-terminus of the 
intein) through a native peptide bond thus forming a mature extein protein and a free 
intein. See Perler et al, Nucleic Acids Res., 22:1125-1127 (1994). The entire protein 

10 splicing process is autocatalyzed by the intein and is believed to be independent of 

specific host cell factors. Indeed, intein-based protein splicing has been shown to occur 
in vitro as well as in heterologous organisms. See Perler et al, Cell, 92:1-4 (1998). 
Intein-based protein splicing has also been shown to be independent of the native 
flanking exteins. Hybrid protein sequences containing inteins fused to non-native 

15 polypetide sequences are able to undergo protein splicing to excise the inteins and ligate 
the flanking polypeptide sequences. See e.g., Evans et al, J. Biol Chem., 274:3923-3926 
(1999); Evans et al, J. Biol Chem., 275:9091-9094 (2000). 

Certain amino acid sequences within an intein sequence are irrelevant to protein 
splicing. Based on sequence comparison and structural analysis, it is now known that the 

20 residues responsible for splicing are the intein N-terminal 100 amino acids, 

approximately, and the intein C-terminal 50 amino acids, approximately. See e.g., Duan 
et al, Cell, 89:555-564(1997), Hall et al, Cell, 91:85-97 (1997); Klabunde et al, Nature 
Struct. Biol 5:31-36 (1998). Indeed, a functional mini-intein can be produced by 
deleting the centrally located irrelevant amino acid sequence leaving the N-terminal 

25 sequence of about 100 amino acids fused directly to the C-terminal sequence of about 50 
amino acids. See e.g., Wu et al, Biochim. Biophys. Acta., 1387:422-32 (1998). In 
addition, inteins have been identified that can mediate trans-splicing even when the N- 
terminal intein sequence and the C-terminal intein sequence are in different proteins. See 
id.; see also, Shingledecker et al, Gene, 207:187-195 (1998); Evans et al, J. Biol 

30 Chem., 274:3923-3926 (1999); Evans et al, J. Biol Chem., 275:9091-9094 (2000). 
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The present invention utilizes the trans-splicing capability of inteins to provide a 
method for detecting interactions between test agents. Thus, in accordance with the 
present invention, two fusion constructs are provided: one has a first test agent and an N- 
intein, and the other has a second test agent and a C-intein. In addition, one or both 
5 fusion constructs have an inactive reporter that undergoes detectable changes upon inter- 
mediated trans-splicing of the fusion constructs. If the first and second test agents 
interact with each other and bring the N-intein and C-intein into close proximity to each 
other, protein trans-splicing takes place. As a result, the fusion constructs are trans- 
spliced and/or re-ligated causing detectable changes in the reporter. By detecting the 

10 changes in the reporter, the interaction between two test agents can be determined. 

As used herein, the terms "N-intein" and "C-intein" refer to an N-terminal and a 
C-terminal portion of an intein, respectively. An N-intein itself alone cannot direct 
protein splicing, and likewise, a C-intein itself alone is incapable of catalyzing protein 
splicing. However, when an N-intein and a C-intein are placed in close proximity, they 

15 are capable of acting in concert to catalyze protein trans-splicing. Conserved intein 

motifs have been identified in many inteins. Typically, an intein includes an N-terminal 
splicing region having sequence motifs designated A, N2, B, and N4, an endonuclease or 
linker domain region having sequence motifs designated C, D, E, and H, and a C-terminal 
splicing region having sequence motifs designated F and G. See Pietrokovski, Protein 

20 Set, 3:2340-2350 (1994); Pietrokovski, Protein Set, 7:64-71 (1998). Thus, in a specific 
embodiment, N-intein encompasses at least motifs A, N2, B, and N 4 , while C-intein 
includes at least motifs F and G. Typically, "N-intein" is an amino acid sequence 
matching the N-terminal sequence of about 90 to 1 10 amino acids of an intein, while "C- 
intein" is an amino acid sequence matching the C-terminal sequence of about 30 to 50 

25 amino acids of an intein. A skilled artisan will recognize that optimal sequences of N- 
inteins and C-inteins can be determined by routine trial and error experiments. In 
addition, it should be understood that the terms "N-intein" and "C-intein" also encompass 
non-native or modified amino acid sequences that are derived from an N-terminal or C- 
terminal portion of an intein, respectively, e.g., modified or mutein forms containing 

30 amino acid insertions, deletions, or substitutions. 
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Protein precursors containing inteins have been found in all three life domains: 
archaea, bacteria, and eucarya. A large number of inteins exist in bacteria and a few also 
found in yeast. See Perler et ah, Nucleic Acids Res., 28:1 344-5 (2000); see also InBase, 
the New England Intein Database, at http://www. neb, com/neb/inteins. html . The N-intein 
5 and C-intein used in the fusion constructs of the present invention can be selected 
according to the naturally occurring intein sequences. Alternatively, the naturally 
occurring intein sequences can be modified by deleting, inserting, or substituting amino 
acids to generate desirable properties in the N- and C-intein. 

Some naturally occurring native N-inteins and C-inteins are known to interact 

10 with each other. This may cause undesireable background and could yield a high 
frequency of false positives. To minimize the background and increase the assay 
sensitivity in the present invention, it is preferred to use an N-intein and a C-intein that do 
not substantially interact with each other. That is, they do not exhibit sufficient physical 
affinity to each other or form chemical bonds between them so as to bring them 

15 physically close to each other to cause substantial protein trans-splicing. Such non- 
interaction will be operationally defined as an inability of an N-intein/C-intein pair to 
yield an active reporter when fused to test agents known to have no affinity for one 
another. 

If the N-intein and C-intein have relatively high affinity to each other, the N- 
20 intein and C-intein can be mutated to minimize their interaction. Alternatively, as will be 
described in detail below, competitive inhibitors of the reporters can be applied to 
minimize background detection signals. In this way, the detection signal from the active 
reporter produced by the interaction between the test proteins will be sufficiently greater 
than the background detection signal such that the interaction between the test proteins 
25 can be distinguished from the background interaction between the N-intein and C-intein. 
Various trans-splicing assays may be used in combination with recombinant 
mutagenesis techniques to generate an N-intein and a C-intein that do not interact with 
each other and yet are capable of catalyzing protein trans-splicing when brought to 
proximity to each other. Conveniently, a genetic selection assay can be employed. For 
30 example, as shown in Figure 2A, two chimeric genes can be prepared using standard 

recombinant DNA technologies. One chimeric gene encodes a fusion protein containing 
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the N-terminal fragment of a reporter protein fused, at its C-terminus, to the N-terminus 
of an N-intein. The other chimeric gene encodes a fusion protein having a C-intein fused, 
at its C-terminus, to the N-terminus of the C-terminal fragment of a reporter protein. The 
N- and C-terminal fragments of the reporter protein should not interact with each other or 
5 with N- or C- intein. They can be in any length so long as an active reporter protein can 
be generated when they are ligated together through protein trans-splicing mediated by 
the N- and C-intein. The genetic selection assay can be performed in any suitable host 
cells, preferably conducted in the same type of cells in which the protein-protein 
interaction detection assay is conducted. The two chimeric genes are introduced to a host 

10 cell for the expression of the two fusion proteins. Alternatively, in the case of yeast cells, 
they can be introduced into two yeast cells having different mating types, which are 
subsequently mated. If the N-intein and C-intein thus expressed interact with each other, 
an active reporter will be detectable in the host cell. To obtain N-inteins and C-inteins 
that do not interact with each other, the DNA coding regions for the N-intein and C-intein 

15 are mutated using standard mutagenesis techniques to create changes in the amino acid 
sequences of the N- and C-intein. The thus generated mutant chimeric genes are then 
introduced into host cells for the genetic selection assay described above. If the active 
reporter is cytotoxic or cytostatic, one can select for those yeast cells that express mutant 
N- and C-inteins that fail to interact spontaneously. Finally, both the N- and C-extein 

20 fusion proteins can be C-terminally tagged with an epitope to allow immunologic 

confirmation of expression of the non-interacting intein mutants. In this manner, random 
mutations can be caused in the N- and C-intein and those mutant N-inteins and C-inteins 
that do not interact with each other are selected. See Figure 2A. 

Besides random mutagenesis, site-directed mutagenesis can also be used to 

25 change amino acid sequences in wild-type N- and C-inteins in predetermined manners. 
For example, amino acid sequences can be modified to create consensus sequences for 
phosphorylation by protein kinases or for glycosylation. Alternatively, certain amino 
acids in wild-type N- and C-intein sequences can also be chemically modified, e.g., by 
incorporating non-natural amino acids or by chemically linking certain moieties to amino 

30 acid side chains. 
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The selection of non-interacting N-intein and C-intein can also be done in an in 
vitro assay. For example, fusion proteins containing wild-type or mutated N- or C-inteins 
expressed from the above-described chimeric genes can be purified by standard 
chromatographic or affinity techniques or prepared in crude cell extracts. Fusion protein 
5 pairs (in which one contains an N-intein and the other contains a C-intein) are then mixed 
and incubated together in vitro under appropriate conditions to promote protein splicing 
as described below. 

The thus selected N- and C-inteins can be further tested for their ability to 
catalyze protein trans-splicing in a host cell. For this purpose, the selected chimeric 

10 genes containing desirable N- and C-intein coding sequences are further modified. 

Figure 2B illustrates an example of this verification process. Essentially, a pair of new 
chimeric genes are constructed and introduced into a host cell for expressing a pair of 
fusion proteins. One chimeric gene encodes a fusion protein containing the above- 
described N-terminal fragment of a reporter protein fused, at its C-terminus, to the N- 

15 terminus of an N-intein, and a bait protein fused to the C-terminus of the N-intein. The 
other chimeric gene encodes a fusion protein having a C-intein fused, at its C-terminus, to 
the N- terminus of the above-described C-terminal fragment of a reporter protein, and a 
prey protein fused to the N-terminus of the C-intein. The bait protein and prey protein 
are known to interact with each other. Any pair of interacting proteins known in the art 

20 can be used for this purpose, such as the interacting pairs: FKBP12 and TGF(3R1; 

FKBR12 and FRAP; thyroid hormone receptor a and nuclear corepressor 1; Ras and Raf. 
See Huang and Schreiber, Proc Natl Acad Sci USA, 94: 13396-401 (1997); Rossi et ai, 
Proc Natl Acad Sci USA, 94:8405-10 (1997); Chen and Evans, Nature, 377:454-7 (1995); 
Pelletier et a/., Proc Natl Acad Sci USA t 95: 12141-6 (1998). After the new chimeric 

25 genes are expressed in a host cell to produce the fusion proteins, the active reporter is 

detected to determine whether trans-splicing has occurred. In this manner, N-inteins and 
C-inteins that do not interact with each other but are nevertheless capable of mediating 
protein trans-splicing when they are brought into proximity can be identified. 

It should be recognized that, although much of the description below is focused 

30 on protein-protein interactions, the method of the present invention for detecting 

interactions is applicable to any test agents, preferably macromolecules. For example, 
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interactions among macromolecules such as oligosaccharides, lipids, nucleic acids, 
proteins, organic molecules including steroids and other drugs, viruses, and cells can all 
be detected by the present method. Thus, in accordance with present invention, two 
fusion constructs can be provided, one having an N-intein and a first test agent and the 
5 other having a C-intein and a second test agent. At least one of the two fusion constructs 
has an inactive reporter capable of being converted to an active reporter upon trans- 
splicing mediated by the N-intein and the C-intein. The two fusion constructs are then 
mixed and incubated together or allowed to contact with each other in other manners 
under appropriate conditions. Each of the two fusion constructs should be designed such 
10 that the interaction between the first and second test agents can be determined by 

h detecting or measuring the active reporter in the assay system. 

^ Optionally, a control assay is conducted in parallel to the detection assay. 

y Typically, in the control assay, the potential interaction between the two test agents being 

J assayed in the detection assay of this invention is pre-empted, eliminated or inhibited. 

~ 15 For example, in one control assay, control fusion constructs are used, in which two 

O known agents that do not interact with each other are included in lieu of the first and 

M 

h second test agents, respectively. Because the known agents in the control fusion 

+ constructs do not interact with each other, any active reporter signal in the control assay 

fy is a background signal. Alternatively, in another control assay, the control fusion 

20 constructs do not contain the first or second test agents. In other words, the control 
fusion constructs are different from those in a detection assay in that the control fusion 
constructs do not contain test agents. Thus, any active reporter signal in the control assay 
would not be the result of interaction between the test agents. 

Preferably, a control assay utilizes the same two fusion constructs as those in a 
25 detection assay, which contain a first and a second test agent, respectively. However, the 
control assay is conducted in the presence of an inhibitor that interferes with the 
interaction between the first and second test agents in the fusion constructs. Typically, 
the inhibitor is an agent that interacts with one of the two test agents in a manner such 
that the interaction between the two test agents is disrupted, and as a result, the active 
30 reporter that would normally be formed upon interaction between the two test agents is 
not produced. Conveniently, one of the two test agents is used as an inhibitor. Such an 
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agent should be in a free non-hybrid form or in a hybrid form that will not cause the 
formation of the active reporter upon an interaction between this hybrid form and the 
other test agent in one of the two fusion constructs. For example, if the test agent used as 
an inhibitor is a protein, it can be conveniently expressed from an expression vector 

5 containing a gene sequence encoding the protein. 

The level of detectable active reporter in the control assay is compared to that in 
the detection assay. As a result, positive signals indicating specific interactions in the 
detection assay can be confirmed and distinguished from background signals inherent in 
the assay system. A control assay is especially useful when the N-intein and C-intein 

10 used in the fusion constructs can interact with each other. 

A control assay can also be conducted simultaneously with the testing assay in the 
same reaction mixture. In this case, the third and fourth fusion constructs described 
above should contain a second reporter different than that in the first and second fusion 
constructs such that the inability of the third and fourth fusion constructs to interact with 

15 each other can be demonstrated by detecting the presence or absence of an active form of 
the second reporter. 

As will be apparent to a skilled artisan, any arrangements of the components in 
the fusion constructs of the present invention can be adopted so long as the trans-splicing 
mediated by the N- and C-intein and initiated by a specific interaction between the test 
20 agents can be detected by measuring the active reporter produced during the protein 
splicing process. 

In one embodiment, as shown in Figure 3 A, one fusion construct has a first test 
agent X fused or conjugated to the C-terminus of an N-intein, while the other fusion 
construct has a second test agent Y fused to the N-terminus of a C-intein and a reporter R 

25 (inactive) fused to the C-terminus of the same C-intein. Upon tans-splicing, the reporter 
is excised off and becomes a free detectable active reporter R*. 

In another embodiment, as shown in Figure 3B, one fusion construct has a first 
test agent X fused to the C-terminus of an N-intein and a reporter R (inactive) fused to 
the N-terminus of the same N-intein. The other fusion construct includes a second test 

30 agent Y fused to the N-terminus of a C-intein. After trans-splicing mediated by the N- 
and C-intein, a detectable free active reporter R* is released. 
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Figure 3C illustrates the fusion construct arrangement in another embodiment of 
the invention. The first fusion construct consists of a first portion of a reporter R (R0 
fused to the N-terminus of an N-intein and a first test agent (X) fused to the C-terminus 
of the same N-intein. The second fusion construct consists of a second test agent (Y) 
5 fused to the N-terminus of a C-intein and the remaining portion of the reporter R (R 2 ) 
fused to the C-terminus of the same C-intein. In this manner, upon intein-directed trans- 
splicing, the two portions of the reporter R are ligated together thus forming a detectable 
active reporter R. 

Figure 3D is a diagram showing the fusion constructs design in yet another 
10 embodiment of the present invention. The first fusion construct consists of a first test 
agent (X) fused to a first portion of a reporter R (Ri) which in turn is fused to the N- 
terminus of an N-intein. The second fusion construct consists of a C-intein, the 
remaining portion of the reporter R (R 2 ) fused to the C-terminus of a C-intein, and a 
second test agent (Y) fused to R 2 . If the test agents X and Y interact with each other to 
15 bring the N-intein and C-intein close together, trans-splicing will result in a detectable 
construct X-R-Y. 

Yet another arrangement of the fusion constructs is demonstrated in Figure 3E. 
The first construct is composed of a first portion of a reporter R (Ri) fused to the N- 
terminus of an N-intein and a test agent (X) fused to the C-terminus of the same N-intein. 

20 The second construct has a C-intein, the remaining portion the reporter R (R 2 ) fused to 
the C-terminus of the C-intein, and another test agent (Y) fused to R 2 . Assuming test 
agents X and Y interact with each other, thus bringing the N-intein and C-intein close 
together, trans-splicing can occur resulting in a detectable construct R-Y. 

Figure 3F illustrates yet another possible arrangement of the fusion constructs in 

25 the present invention. As shown in Figure 3F, the first fusion construct has a test agent 
(X) fused to a first portion of a reporter R (Ri) which is in turn fused to the N-terminus of 
an N-intein. The second fusion construct includes another test agent (Y) fused to the N- 
terminus of a C-intein and the remaining portion of the reporter R (R 2 ) fused to the C- 
terminus of the same C-intein. Assuming test agents X and Y interact with each other, 

30 thus bringing the N-intein and C-intein close together, trans-splicing can occur resulting 
in a detectable construct X-R. 
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As discussed above, the test agents can be any chemical compounds and are not 
limited to proteins. Likewise, both the inactive and active reporter(s) incorporated into 
the fusion constructs can be any suitable chemical compounds so long as specific and 
detectable changes can occur in the inactive reporter(s) during trans-splicing. The fusion 

5 constructs can be prepared by chemical synthesis and/or standard recombinant DNA 
techniques. For example, when the reporters or test agents are not protein, the N-intein 
and C-intein can be prepared by chemical synthesis or recombinant expression, and 
thereafter, the non-proteinaceous reporter or test agents can be chemically conjugated to 
the N-intein and/or C-intein through direct linkage or using a linker molecule. Methods 

10 for conjugating a protein or peptide to a molecule such as glycosaccharides, lipids, 

steroids, drugs, nucleic acids, and the like are known in the art and should be apparent to 
a skilled artisan apprised of the present disclosure. If both the test agents and reporters 
are proteins, the fusion constructs can be conveniently produced as fusion proteins by 
recombinantly expressing suitable chimeric genes. The fusion proteins can be extracted 

15 in a crude cell extract form or purified for in vitro assay. Purification can be achieved by 
conventional purification methods such as standard chromatographic or affinity 
techniques. 

Naturally occurring, intein-based protein splicing is largely independent of the 
amino acid composition of exteins with a single exception: the first residue of the C- 

20 extein is invariably cysteine, threonine, or serine. Thus, when a non-protein inactive 
reporter or test agent is linked to the C-terminus of the C-intein in a fusion construct of 
the present invention, it is preferred that the non-protein entity is conjugated to the C- 
intein through a linker such as amino acid cystenine, serine, and threonine. In the case of 
a polypeptide reporter or polypeptide test agent fused to the C-terminus of the C-intein, it 

25 may also be preferred that the first amino acid of the polypeptide immediately following 
the C-terminus of the C-intein is cystenine, serine, or threonine. In the event that the C- 
terminus of the C-intein is exposed and not fused to any moiety, it may be desirable to 
design the C-intein such that it includes an additional amino acid selected from cystenine, 
serine, and threonine. Alternatively, a reducing thiol acid such as cysteine, 

30 mercaptoacetic acid, dithiothreitol, thiphenol, and the like may be added to the assay 

system. See e.g., Paulus, Annu. Rev. Biochem., 69:447-496 (2000); Severinov and Muir, 
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J. Biol Chem., 273:16205-16209 (1998). In addition, where the N-terminus of an N- 
intein in the fusion constructs is linked to another non-protein moiety, it is also preferable 
that the chemical linkage between the N-intein and the non-protein moiety is an amide 
linkage and preferably a peptide bond. This can be achieved by using an amino acid as a 

5 linker between the non-protein moiety to the N-terminus of the N-intein. 

The detection assay in accordance with the present invention is conducted in 
vitro. The fusion constructs in crude cell extracts or in purified forms can be mixed and 
incubated together under appropriate conditions that allow interactions between the test 
agents. Methods for performing in vitro trans-splicing assays are disclosed, e.g., in U.S. 

10 Patent No. 5,834,247, which is incorporated herein by reference. It is noted that different 
agents may require different conditions for their interactions. As a starting point, for 
example, a buffer having 20 mM Tris-HCl, pH 7.0 and 500 mM NaCl may be used. 
Several different parameters may be varied, including temperature, pH, salt 
concentration, reducing agent, time, and the like. Some minor degree of experimentation 

15 may be required to determine the optimum incubation condition, this being well within 
the capability of one skilled in the art once apprised of the present disclosure. Cell free in 
vitro assays are especially suitable where the fusion constructs contain non-protein 
elements that cannot be synthesized by recombinant DNA technologies. In addition, in 
vitro assays also eliminate the constraints created by cell compartments and are useful in 

20 detecting interactions that may not be detectable in certain in vivo assays known in the 
art. 

In a specific embodiment, a fusion construct that is a fusion protein is 
recombinantly expressed in a host cell and secreted out from the host cell For this 
purpose, a signal peptide or secretion signal is preferably included in the fusion protein to 

25 enable the recombinantly synthesized fusion protein to secret into the extracellular 

environment. Preferably the fusion protein lack a membrane anchoring domain, so that 
the fusion protein is secreted into the extracellular environment to allow the detection 
assay to be conducted in vitro without having to purify the fusion protein. Thus, in a 
more specific embodiment, each member of a pair of fusion proteins according to the 

30 present invention capable of secretion out of cells is separately expressed in host cells. 
To conduct the in vitro assay of the present invention, the different host cells expressing 
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the fusion proteins can be mixed or co-cultured such that the fusion proteins secreted 
from the host cells are allowed to interact with each. Protein trans-splicing is then 
determined. Essentially, by making the fusion proteins secretable, a step of purifying or 
extracting the fusion proteins is obviated. 
5 For recombinant expression of fusion proteins, chimeric genes encoding the 

fusion proteins can be introduced into the appropriate host cells. For this purpose, the 
expression vectors and host cells used in various two-hybrid systems developed in the art 
may be adapted and incorporated in the assays. Such two-hybrid systems are generally 
disclosed in U.S. Patent Nos. 5,283,173; 5,525,490; 5,585,245; 5,637,463; 5,695,941; 
10 5,733,726; 5,776,689; 5,885,779; 5,905,025; 6,037,136; 6,057,101; 6,114,111; andBartel 

q and Fields, eds., The Yeast Two-Hybrid System, Oxford University Press, New York, NY, 

*S 1997, all of which are incorporated herein by reference. 

Q Typically, two chimeric genes are prepared encoding two fusion constructs as 

m described above containing an N-intein and a C-intein, respectively. For the purpose of 
y 15 convenience, the two test polypeptides whose interaction is to be determined are referred 
Q to as "bait polypeptide" and "prey polypeptide," respectively. The chimeric genes 

H encoding the fusion constructs containing the bait and prey polypeptides are termed "bait 

jEj chimeric gene" and "prey chimeric gene," respectively. Typically, a "bait vector" and a 

ft: "P re y vector" are provided for the expression of a bait chimeric gene and a prey chimeric 

20 gene, respectively. 

Many types of vectors can be used for the present invention. Methods for the 
construction of bait vectors and prey vectors should be apparent to skilled artisans in the 
art apprised of the present disclosure. See generally, Current Protocols in Molecular 
Biology, Vol. 2, Ed. Ausubel, et ah, Greene Publish. Assoc. & Wiley Interscience, Ch. 
25 13, 1988; Glover, DNA Cloning, Vol. II, IRL Press, Wash., D.C., Ch. 3, 1986; Bitter, et 
ah, in Methods in Enzymology 153:516-544 (1987); The Molecular Biology of the Yeast 
Saccharomyces, Eds. Strathern et al, Cold Spring Harbor Press, Vols. I and II, 1982; and 
Rothstein in DNA Cloning: A Practical Approach, Vol. 1 1, Ed. DM Glover, IRL Press, 
Wash.,D.C, 1986. 

30 Generally, the bait and prey vectors may include a promoter operably linked to a 

chimeric gene for the transcription of the chimeric gene, an origin of DNA replication for 
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the replication of the vectors in host cells and a replication origin for the amplification of 
the vectors in, e.g., E. coli, and selection marker(s) for selecting and maintaining only 
those host cells harboring the vectors. Additionally, the vectors preferably also contain 
inducible elements, which function to control the expression of the chimeric gene. 
5 Making the expression of the chimeric genes inducible and controllable is especially 
important in the event that the fusion proteins or components thereof are toxic to the host 
cells. Other regulatory sequences such as transcriptional enhancer sequences and 
translation regulation sequences (e.g., Shine-Dalgarno sequence) can also be included. 
Termination sequences such as the bovine growth hormone, SV40, lacZ and AcMNPV 

10 polyhedral polyadenylation signals may also be operably linked to the chimeric gene. An 
epitope tag coding sequence for detection and/or purification of the fusion proteins can 
also be incorporated into the expression vectors. Examples of useful epitope tags 
include, but are not limited to, influenza virus hemagglutinin (HA), Simian Virus 5 (V5), 
polyhistidine (6xHis), c-myc, lacZ, GST, and the like. Proteins with polyhistidine tags 

15 can be easily detected and/or purified with Ni affinity columns, while specific antibodies 
to many epitope tags are generally commercially available. Bait and prey vectors may 
also contain components (e.g., signal peptide) that direct the expressed protein 
extracellularly. The vectors can be introduced into the host cells by any techniques 
known in the art, e.g., by direct DNA transformation, microinjection, electroporation, 

20 viral infection, lipofection, gene gun, and the like. The bait and prey vectors can be 

maintained in host cells in an extrachromosomal state, i.e., as self -replicating plasmids or 
viruses. Alternatively, one or both vectors can be integrated into chromosomes of the 
host cells by conventional techniques such as selection of stable cell lines or site-specific 
recombination. 

25 The fusion proteins can be expressed in many different host cells, including but 

not limited to bacteria, yeast cells, plant cells, insect cells, and mammalian cells. A 
skilled artisan will recognize that the designs of the vectors can vary with the host cells 
used. In one embodiment, the assay is conducted in prokaryotic cells such as Escherichia 
colt, Salmonella, Klebsiella, Pseudomonas, Caulobacter, and Rhizobium. Suitable 

30 origins of replication for the expression vectors useful in this embodiment of the present 
invention include, e.g., the ColEl, pSClOl, SV40 and M13 origins of replication. 
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Examples of suitable promoters include, for example, the T7 promoter, the lacZ 
promoter, and the like. In addition, inducible promoters are also useful in modulating the 
expression of the chimeric genes. For example, the lac operon from bacteriophage 
lambda plac5 is well known in the art and is inducible by the addition of IPTG to the 
5 growth medium. Other known inducible promoters useful in a bacteria expression 
system include pL of bacteriophage X, the lac promoter, the trp promoter, hybrid 
promoters such as the tac promoter, promoters such as the T7 promoter fused to 
transcriptional control elements like lacO, and the like. 

In addition, selection markers sequences for selecting and maintaining only those 
10 prokaryotic cells expressing the desirable fusion proteins should also be incorporated into 
q the expression vectors. Numerous selection markers including auxotrophic markers and 
y antibiotic resistance markers are known in the art and can all be useful for purposes of 

□ this invention. For example, the bla gene which confers ampicillin resistance is the most 
^ commonly used selection marker in prokaryotic expression vectors. Other suitable 

m 15 markers include genes that confer neomycin, kanamycin, or hygromycin resistance to the 
Q host cells. In fact, many vectors are commercially available from vendors such as 

^ Invitrogen Corp. of San Diego, Calif., Clontech Corp. of Palo Alto, Calif., BRL of 

HF Bethesda, Maryland, and Promega Corp. of Madison, Wiscon. These commercially 

U available vectors, e.g., pBR322, pSPORT, pBluescriptllSK, pcDNAI, and pcDNAII all 

20 have a multiple cloning site into which the chimeric genes of the present invention can be 
conveniently inserted using conventional recombinant techniques. The constructed 
expression vectors can be introduced into host cells by various transformation or 
transfection techniques generally known in the art. 

In another embodiment, mammalian cells are used as host cells for the expression 
25 of the fusion proteins. For this purpose, virtually any mammalian cell can be used 
including normal tissue cells, stable cell lines, and transformed tumor cells. 
Conveniently, mammalian cell lines such as CHO cells, Jurkat T cells, NEH 3T3 cells, 
HEK-293 cells, CV-1 cells, COS-1 cells, HeLa cells, VERO cells, MDCK cells, WI38 
cells, and the like are used. Mammalian expression vectors are well known in the art and 
30 many are commercially available. Examples of suitable promoters for the transcription 
of the chimeric genes in mammalian cells include viral transcription promoters derived 
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from adenovirus, simian virus 40 (SV40) (e.g., the early and late promoters of SV40), 
Rous sarcoma virus (RSV), and cytomegalovirus (CMV) (e.g., CMV immediate-early 
promoter), human immunodeficiency virus (HIV) (e.g., long terminal repeat (LTR)), 
vaccinia virus promoter (e.g., 7.5K promoter), and herpes simplex virus (HSV) (e.g., 

5 thymidine kinase promoter). Inducible promoters can also be used. Suitable inducible 
promoters include, for example, the tetracycline responsive element (TRE) (See Gossen 
etal, Proc. Natl Acad. Sci. USA, 89:5547-5551 (1992)), metallothionein IIA promoter, 
ecdysone-responsive promoter, and heat shock promoters. Suitable origins of replication 
for the replication and maintanence of the expression vectors in mammalian cells include, 

10 e.g., the Epstein Barr origin of replication in the presence of the Epstein Ban* nuclear 
antigen (see Sugden et al, Mole. Cell Biol, 5:410-413 (1985)) and the SV40 origin of 
replication in the presence of the SV40 T antigen (which is present in COS-1 and COS-7 
cells) (see Margolskee et al, Mole. Cell Biol, 8:2837 (1988)). Suitable selection 
markers include, but are not limited to, genes conferring resistance to neomycin, 

15 hygromycin, zeocin, and the like. Many commercially available mammalian expression 
vectors may be useful for the present invention, including, e.g., pCEP4, pcDNAI, pIND, 
pSecTag2, pVAXl, pcDNA3.1, and pBI-EGFP, and pDisplay. The vectors can be 
introduced into mammalian cells using any known techniques such as calcium phosphate 
precipitation, lipofection, electroporation, and the like. The bait vector and prey vector 

20 are preferably expressed in different cells. 

Viral expression vectors, which permit introduction of recombinanat genes into 
cells by viral infection, can also be used for the expression of the fusion proteins. 
Typically, viral vectors having the chimeric genes incorporated therein are viable and can 
be easily introduced into host cells by viral infection. Viral expression vectors generally 

25 known in the art include viral vectors based on adenovirus, bovine papilloma virus, 

murine stem cell virus (MSCV), MFG virus, and retrovirus. See Sarver, et al, Mol Cell 
Biol., 1: 486 (1981); Logan & Shenk, Proc. Natl Acad. Sci. USA, 81:3655-3659 (1984); 
Mackett, et al, Proc. Natl. Acad. Sci. USA, 79:7415-7419 (1982); Mackett, et al, J. 
Virol, 49:857-864 (1984); Panicali, etal, Proc. Natl Acad. Sci. USA, 79:4927-4931 

30 (1982); Cone & Mulligan, Proc. Natl. Acad. Sci. USA, 81:6349-6353 (1984); Mann et al, 
Cell, 33:153-159 (1993); Pear et al.,Proc. Natl Acad. Sci. USA, 90:8392-8396 (1993); 
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Kitamura et al, Proa Natl Acad. Set USA, 92:9146-9150 (1995); Kinsella et al, Human 
Gene Therapy, 7:1405-1413 (1996); Hofmann et al, Proa Natl Acad. Set USA, 
93:5185-5190 (1996); Choate et al y Human Gene Therapy, 7:2247 (1996); WO 
94/19478; Hawley et al, Gene Therapy, 1:136 (1994) and Rivere et al, Genetics, 

5 92:6733 (1995), all of which are incorporated by reference. 

Generally, to construct a viral vector, a chimeric gene according to the present 
invention can be operably linked to a suitable promoter. The promoter-chimeric gene 
construct is then inserted into a non-essential region of the viral vector, typically a 
modified viral genome. This results in a viable recombinant virus capable of expressing 

10 the fusion protein encoded by the chimeric gene in infected host cells. Once in the host 
cell, the recombinant virus typically is integrated into the genome of the host cell. 
However, recombinant bovine papilloma viruses typically replicate and remain as 
extrachromosomal elements. 

In another embodiment, the fusion proteins are expressed in plant cells. Methods 

15 for expressing exogenous proteins in plant cells are well known in the art. See generally, 
Weissbach & Weissbach, Methods for Plant Molecular Biology , Academic Press, NY, 
1988; Grierson & Corey, Plant Molecular Biology, 2d Ed., Blackie, London, 1988. 
Recombinant virus expression vectors based on, e.g., cauliflower mosaic virus (CaMV) 
or tobacco mosaic virus (TMV) can all be used. Alternatively, recombinant plasmid 

20 expression vectors such as Ti plasmid vectors and Ri plasmid vectors are also useful. 
The chimeric genes encoding the fusion proteins of the present invention can be 
conveniently cloned into the expression vectors and placed under control of a viral 
promoter such as the 35S RNA and 19S RNA promoters of CaMV or the coat protein 
promoter of TMV, or of a plant promoter, e.g., the promoter of the small subunit of 

25 RUBISCO and heat shock promoters (e.g., soybean hspl7.5-E or hspl7.3-B promoters). 
In addition, the fusion proteins can also be expressed in insect cells, e.g., 
Spodoptera frugiperda cells, using a baculovirus expression system. Expression vectors 
and host cells useful in this system are well known in the art and are generally available 
from various commercial vendors. For example, the chimeric genes of the present 

30 invention can be conveniently cloned into a non-essential region (e.g., the polyhedrin 
gene) of an Autographa californica nuclear polyhedrosis virus (AcNPV) vector and 
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placed under control of an AcNPV promoter (e.g., the polyhedrin promoter). The non- 
occluded recombinant viruses thus generated can be used to infect host cells such as 
Spodoptera frugiperda cells in which the chimeric genes are expressed. See Smith, U.S. 
Patent No. 4,215,051. 

5 As described above, each of the two fusion constructs should be designed such 

that the interaction between the first and second test agents is determinable by detecting 
or measuring changes in the reporter in the assay system. It will be apparent from the 
above discussion, the reporter can be any molecules or moieties so long as changes in the 
reporter that are specifically associated with intein-mediated trans-splicing are detectable. 

10 Conveniently, the occurrence of trans-splicing can be detected by detecting 

changes in the size of the reporter. For example, the sizes of the various components of 
the fusion constructs can be designed such that the "active reporter," which is generated 
when the "inactive reporter" is simply cleaved off from one of the fusion constructs or 
recombined with one or more other components of the fusion constructs, is 

15 distinguishable from its precursor(s) and other trans-splicing products based on size, i.e., 
molecular weight. The inactive reporter can be pre-labeled with, e.g., radioactive isotope 
or fluorescence or other detectable markers, and the active reporter can be detected in, 
e.g., gel electrophoresis either before or after purification. Purification can be based on 
specific affinity columns using an antigen-specific protein, e.g., light-chain 

20 immunoglobulin, heavy-chain immunoglobulin, avidin, streptavidin, protein A, and 
antigenic peptides. Conveniently, the commonly used and commercially available 
epitope tags may be used as size-based reporters. Such epitope tags include sequences 
derived from, e.g., influenza virus hemagglutinin (HA), Simian Virus 5 (V5), 
polyhistidine (6xHis), c-myc, lacZ, GST, and the like. For example, proteins with 

25 polyhistidine tags can be easily detected and/or purified with Ni affinity columns. One 
advantage for using such epitope tags is that specific antibodies to many of these epitope 
tags are generally commercially available. Alternatively, an epitope-specific antibody 
specifically to the "active reporter" can be used to detect the level of the active reporter 
generated in the assay without purification. 

30 In another embodiment, the fusion constructs are designed such that the active 

reporter produced during intein-mediated trans-splicing can be detected by a color-based 
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assay. For example, when an N-terminal portion of the lacZ protein (P-galactosidase) is 
fused to the N-terminus of an N-intein in a fusion construct and a C-terminal portion of 
the lacZ protein is fused to the C-terminus of a C-intein in another fusion construct, 
protein trans-splicing will religate the N- and C-terminal portions of the lacZ protein to 
5 form a full-length complete and active lacZ protein. Thus, in the presence of a substrate 
for p-galactosidase (e.g., X-Gal, i.e., 5-bromo-4-chloro-3-indolyl-p-D-galactoside), the 
trans-splicing can be detected based on appearance of a blue color or by quantitative 
colorimetric assay. To produce the chimeric genes in this embodiment of the invention, 
the lacZ gene encoding P-galactosidase can be divided into a 5' portion and a 3' portion 
10 in any manner to encode an N-terminal portion and a C-terminal portion of the p- 

galactosidase. As discussed above, it may be advantageous to facilitate protein splicing if 
n the first amino acid immediately following C-intein is cysteine, serine, or threonine. 

5 Thus, if at all possible, the division of the lacZ gene is made immediately before a genetic 

D codon for cysteine, serine, or threonine such that the first amino acid in the C-terminal 
I] 15 portion of P-galactosidase immediately following a C-intein in a fusion construct is one 
of the three preferred amino acids. Certain mutations may also be introduced into the 
lacZ gene to substitute a cysteine, serine or threonine for another amino acid, or for any 
other purposes, so long as the mutation does not adversely interfere with protein trans- 
!r; splicing or the detection of the active reporter protein, i.e., P-galactosidase. 

20 As will be apparent, many other reporters can be used in a similar manner in the 

present invention. Such other reporters include, for example, the green fluorescent 
protein (GFP), which can be detected by fluorescence assay and sorted by flow-activated 
cell sorting (FACS) (See Cubitt et al, Trends Biochem. ScL, 20:448-455 (1995)), secreted 
alkaline phosphatase, horseradish peroxidase, the blue fluorescent protein (BFP), and 
25 lucif erase photoproteins such as aequorin, obelin, mnemi opsin, and berovin (See U.S. 
Patent No. 6,087,476, which is incorporated herein by reference). 

The method of the present invention for detecting protein-protein interactions can 
also be used to screen a library of fusion proteins. Methods for constructing activation 
domain or DNA binding domain fusion libraries and the use thereof in yeast two-hybrid 
30 system are well known in the art and are disclosed in e.g., Vojtek et al, in The Yeast 

Two-Hybrid System, Bartel and Fields, eds., pages 29-42, Oxford University Press, New 
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York, NY, 1997; Zhu et al, in The Yeast Two-Hybrid System, Bartel and Fields, eds., 
pages 73-96, Oxford University Press, New York, NY, 1997. The methods described in 
the above references can all be applied to the present invention upon appropriate 
modifications. By way of example, N-intein fusion libraries can be prepared using an 
5 expression vector containing a 5' portion of a reporter gene operably linked to the 5' end 
of N-intein coding sequence. Operably linked to the 3' end of the N-intein coding 
sequence is a multiple cloning site into which various random or predetermined (e.g., 
cDNAs) DNA sequences can be inserted in frame. The DNA library thus prepared can 
be transformed into appropriate host cells to recombinantly express the fusion proteins 

10 encoded by the chimeric genes. Thus, an array of fusion proteins can be expressed, with 
each fusion protein containing an N-terminal portion of the reporter protein fused to the 
N-terminus of the N-intein and a random or predetermined polypeptide fused to the C- 
terminus of the N-intein. A fusion protein including a bait protein fused to the N- 
terminus of a C-intein and the C-terminal portion of the reporter protein fused to the C- 

15 terminus of the C-intein can be mixed with the prey fusion protein library in vitro to 
identify prey proteins capable of interacting with the bait protein. Similarly, C-intein 
fusion libraries can also be established and screened using an N-intein-containing fusion 
protein. 

In yet another embodiment of the detection method of the present invention, the 
20 detection assay is used to detect interactions between three or more agents in a trimeric or 
higher order complex. See U.S. Patent No. 5,695,941; Chang et al, Cell, 79:131-141 
(1994); Tirode et al, J. Biol Chem., 272:22995-22999 (1997); Van Criekinge et al, 
Anal Biochem., 263:62-66 (1998); and Pause et al, Pore. Natl Acad. Set USA, 96:9533- 
9538 (1999), all of which are incorporated herein by reference. Essentially, the above- 
25 described detection assay of this invention involving two fusion constructs is conducted 
in the presence of one or more other test agents. In this manner, interactions between the 
two test agents in the fusion constructs that require the participation of the other test 
agents can be detected. 

The other test agents can be small molecule ligands that interact with the test 
30 agents in the fusion constructs. Many protein-protein interactions require the presence of 
a small molecule ligand, which becomes an integral part of the assembly formed by the 
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protein interactions. See Berlin, in The Yeast Two-Hybrid System, Bartel and Fields, eds., 
pages 259-272, Oxford University Press, New York, NY, 1997. For example, immune 
suppressants such as cyclosporin A (CsA), FK506, and rapamycin are known to bind 
with high affinity to immunophilins forming protein-drug complexes which, in turn, bind 
5 to specific target proteins to inhibit their activities. Classic yeast two-hybrid system has 
been employed successfully to isolate proteins interacting with the FKBP12/rapamycin 
complex. See, e.g., Chiu etal, Proc. Nat. Acad. ScL USA, 91:12574-12578 (1994). A 
multi-hybrid assay in accordance with the present invention can be conducted in vitro. In 
an in vitro assay, the small molecule ligands are simply added to the above-described 

10 intein-based two-hybrid assay system of the present invention. 

Many protein interactions require the participation of other proteins. Thus, the 
other test agents in the multi-hybrid assay of the present invention can also be proteins. 
In a specific embodiment, the additional test proteins are enzymes capable of post- 
translationally modifying at least one of the test polypeptides in the intein-containing 

15 fusion constructs of the present invention. See Figure 4. This is especially useful when 
one or both of the test proteins in the intein-containing fusion proteins are believed to 
contain consensus sequences for certain modifying enzymes. A two-hybrid system 
involving modifying enzymes has been disclosed in, e.g., U.S. Patent No. 5,637,463, 
which is incorporated herein by reference. This system can be applied to the present 

20 invention upon appropriate modifications as will be apparent to a skilled artisan apprised 
of the present disclosure. Examples of useful modifying enzymes include protein kinases 
which catalyze protein phosphorylation (e.g., serine/threonine phosphorylation, tyrosine 
phosphorylation by tyrosine kinase, see Lioubin et al, Genes Dev., 10:1084-1095 
(1996)); Keegan et al, Oncogene, 12:1537-1544 (1996)), fatty acid acylation, ADP- 

25 ribosylation, myristylation, and glycosylation. 

In accordance with another embodiment of the present invention, the intein-based 
in vitro assay incorporates microarrays. Essentially, a plurality of fusion constructs 
according to the present invention are immobilized on a solid substrate to form an array 
of fusion constructs. One or more other fusion constructs according to the present 

30 invention can be contacted with the immobilized fusion constructs under conditions that 
allow protein-protein interactions and intein-mediated protein trans-splicing. The 
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immobilized fusion constructs may contain N-intein while the other fusion constructs 
contain C-intein, or vice versa. The formation of active reporter is then detected in the 
microarray. 

In a preferred embodiment, the fusion constructs are configured such that the 
5 active reporter generated as a result of protein trans-splicing between a pair of fusion 
constructs is tethered or covalently linked to one of the fusion constructs, preferably to 
the fusion construct immobilized on the solid substrate. For example, the fusion 
constructs can be provided in a configuration according to Figure 3D, Figure 3E or 
Figure 3F. With the active reporter tethered to the immobilized fusion construct in the 
10 microarray, rapid and parallel identification of multiple protein-protein interactions is 
made possible. 

For fusion protein constructs, a protein microarray having N-intein or C-intein- 
containing fusion proteins of the present invention can be prepared by a number of 
methods known in the art. An example of a suitable method is that disclosed in 

15 MacBeath and Schreiber, Science, 289:1760-1763 (2000). Essentially, glass microscope 
slides are treated with an aldehyde-containing silane reagent (SuperAldehyde Substrates 
purchased from TeleChem International, Cupertino, California). Nanoliter volumes of 
protein samples in a phophate-buffered saline with 40% glycerol are then spotted onto the 
treated slides using a high-precision contact-printing robot. After incubation, the slides 

20 are immersed in a bovine serum albumin (BS A)-containing buffer to quench the 

unreacted aldehydes and to form a BSA layer which functions to prevent non-specific 
protein binding in subsequent applications of the microchip. Alternatively, as disclosed 
in MacBeath and Schreiber, fusion proteins of the present invention can be attached to a 
BSA-NHS slide by covalent linkages. BSA-NHS slides are fabricated by first attaching a 

25 molecular layer of BSA to the surface of glass slides and then activating the BSA with 

N,N'-disuccinimidyl carbonate. As a result, the amino groups of the lysine, asparate, and 
glutamate residues on the BSA are activated and can form covalent urea or amide ' 
linkages with protein samples spotted on the slides. See MacBeath and Schreiber, 
Science, 289:1760-1763 (2000). 

30 Another example of useful method for preparing the protein microchip is that 

disclosed in PCT Publication Nos. WO 00/4389A2 and WO 00/04382, both of which are 
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assigned to Zyomyx and are incorporated herein by reference. First, a substrate or chip 

base is covered with one or more layers of thin organic film to eliminate any surface 

defects, insulate proteins from the base materials, and to ensure a uniform protein array. 

Next, a plurality of protein-capturing agents (e.g., antibodies, peptides, etc.) are arrayed 
5 and attached to the base that is covered with the thin film. Fusion proteins can then be 

bound to the capturing agents forming a protein microarray. The protein microchips are 

kept in flow chambers with an aqueous solution. 

The protein microarray can also be made by the method disclosed in PCT 

Publication No. WO 99/36576 assigned to Packard Bioscience Company, which is 
10 incorporated herein by reference. For example, a three-dimensional hydrophilic polymer 
H matrix, i.e., a gel, is first deposited on a solid substrate such as a glass slide. The polymer 

q matrix gel is capable of expanding or contracting and contains a coupling reagent that 
% reacts with amine groups. Thus, fusion proteins can be contacted with the matrix gel in 

an expanded aqueous and porous state to allow reactions between the amine groups on 
go 15 the fusion proteins with the coupling reagents thus immobilizing the fusion proteins on 
Jl the substrate. Thereafter, the gel is contracted to embed the attached fusion proteins in 

H the matrix gel. 

=__; 

jj Alternatively, the fusion proteins of the present invention can be incorporated into 

5=J; a commercially available protein microchip, e.g., the ProteinChip System from Ciphergen 

20 Biosystems Inc., Palo Alto, CA. The ProteinChip System comprises metal chips having a 
treated surface that interact with proteins. Basically, a metal chip surface is coated with a 
silicon dioxide film. The molecules of interest such as proteins and protein complexes 
can then be attached covalently to the chip surface via a silane coupling agent. 

The protein microchips can also be prepared with other methods known in the art, 
25 e.g., those disclosed in U.S. Patent Nos. 6,087,102, 6,139,831, 6,087,103; PCT 

Publication Nos. WO 99/60156, WO 99/39210, WO 00/54046, WO 00/53625, WO 
99/51773, WO 99/35289, WO 97/42507, WO 01/01142, WO 00/63694, WO 00/61806, 
WO 99/61 148, WO 99/40434, all of which are incorporated herein by reference. 

In accordance with another aspect of the present invention, a method is also 
30 provided for selecting a compound capable of modulating an interaction between 
interacting test agents including proteins. By "modulating" or "modulation" it is 
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intended to mean that the compound interferes with, weakens, dissociates or disrupt 
particular protein-protein interactions, or alternatively, initiates, facilitates or stabilizes 
particular protein-protein interactions. 

As discussed above, most proteins exercise their cellular functions through their 
5 interactions with other proteins. Protein-protein interactions form the basis of almost all 
biological processes. Each biological process or cell machine is composed of a network 
of interacting proteins. For example, many enzymatic reactions are associated with large 
protein complexes formed by interactions among enzymes, protein substrates and protein 
modulators. In addition, protein-protein interactions are also part of the mechanism for 
10 signal transduction and other basic cellular functions such as cell cycle regulation, gene 
H transcription, and translation. Undoubtedly, protein-protein interactions are involved in 

0 various disease pathways. Thus, compounds that modulate particular protein-protein 
JC interactions in disease pathways are potential therapeutic agents useful in treating or 
fi preventing diseases. In this respect, both compounds capable of interfering with 

1 15 undesirable protein-protein interactions and compounds that trigger or stabilize desirable 
L/ protein-protein interactions can be useful. 

H The intein-based in vitro system of the present invention is especially suited for 

]E, screening such compounds. As will be apparent, the screen assay can be based on any of 
jrf the above-described embodiments of the intein-based method for detecting protein- 
20 protein interaction. Thus, two proteins whose interaction needs to be modulated are used 
as test proteins in the intein-containing fusion constructs of the present invention. The 
two fusion constructs containing N-intein and C-intein respectively are allowed to 
interact with each other in the presence of a test compound, and the ability of the test 
compound to modulate the interaction between the two known proteins is determined by 
25 detecting the presence or absence of an active reporter or measuring the relative level of 
the active reporter. 

The screen assay of the present invention can be used to identify compounds 
capable of triggering or stabilizing particular protein-protein interactions. As is known in 
the art, many protein-protein interactions require the presence of small molecule ligands 
30 or other proteins. For example, immune suppressants such as cyclosporin A (CsA), 
FK506, and rapamycin are known to exert their therapeutic effect by mediating the 
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binding of immunophilins to specific target proteins. Thus, two proteins whose 
interaction needs be initiated or strengthened by a therapeutic compound are used as test 
proteins in the intein-based two-hybrid system of the present invention. The fusion 
proteins are provided and allowed to interact with each other in the presence of one or 
5 more test compounds. 

The screen assay of the present invention is also useful in identifying compounds 
capable of interfering with or disrupting particular protein-protein interactions. For 
example, inhibitors of interactions between pathogen coat proteins and their 
corresponding receptors on human cell surface may be selected by the screen assay. 

10 Such inhibitors are potential preventive or therapeutic agents against the pathogen. In 
another example, compounds capable of dissociating interactions between oncogene 
products and their cellular targets are potential anti -cancer agents. Again, two proteins of 
interest whose interaction needs be disrupted by a therapeutic compound are used as test 
proteins in the intein-based two-hybrid system of the present invention. The fusion 

15 proteins are expressed and allowed to interact with each other in the presence of one or 
more test compounds. 

As will be apparent, the screen assay of the present invention can be applied in a 
format appropriate for large-scale screening. For example, combinatorial technologies 
can be employed to construct combinatorial libraries of small organic molecules or small 

20 peptides. See generally, e.g., Kenan et al, Trends Biochem. Sc., 19:57-64 (1994); Gallop 
etal, J. Med. Chem., 37:1233-1251 (1994); Gordon etal, J. Med. Chem., 37:1385-1401 
(1994); Ecker et al, Biotechnology, 13:351-360 (1995). Such combinatorial libraries of 
compounds can be applied to the screen assay of the present invention to isolate specific 
modulators of particular protein-protein interactions. 

25 Any test compounds may be screened in the screening assays of the present 

invention to select modulators of a protein-protein interaction. By the term "selecting" or 
"select" modulators it is intended to encompass both (a) choosing compounds from a 
group previously unknown to be modulators of the protein-protein interaction of interest, 
and (b) testing compounds that are known to be capable of modulating the protein-protein 

30 interaction of interest. Both types of compounds are generally referred to herein as "test 
compounds." The test compounds may include, by way of example, proteins (e.g., 
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antibodies, small peptides, artificial or natural proteins), nucleic acids, and derivatives, 
mimetics and analogs thereof, and small organic molecules having a molecular weight of 
no greater than 10,000 daltons, more preferably less than 5,000 daltons. Preferably, the 
test compounds are provided in library formats known in the art, e.g., in chemically 
5 synthesized libraries, recombinantly expressed libraries (e.g., phage display libraries), 
and in vitro translation-based libraries (e.g., ribosome display libraries). 

Peptidic test compounds may be peptides having L-amino acids and/or D-amino 
acids, phosphopeptides, and other types of peptides. The screened peptides can be of any 
size, but preferably have less than about 50 amino acids. Smaller peptides are easier to 
10 deliver into a patient's body. Various forms of modified peptides may also be screened. 
Like antibodies, peptides can also be provided in, e.g., combinatorial libraries. See 
generally, Gallop et al, J. Med. Chem., 37:1233-1251 (1994). Methods for making 
random peptide libraries are disclosed in, e.g., Devlin et al, Science, 249:404-406 (1990). 
y Other suitable methods for constructing peptide libraries and screening peptides 

p 15 therefrom are disclosed in, e.g., Scott and Smith, Science, 249:386-390 (1990); Moran et 
f ! al, /. Am. Chem. Soc, 1 17: 10787-10788 (1995) (a library of electronically tagged 

Q synthetic peptides); Stachelhaus et al, Science, 269:69-72 (1995); U.S. Patent Nos. 

fi 6,156,511; 6,107,059; 6,015,561; 5,750,344; 5,834,318; 5,750,344, all of which are 

% incorporated herein by reference. For example, random-sequence peptide phage display 
PJ 20 libraries may be generated by cloning synthetic oligonucleotides into the gene III or gene 
VIII of an E. coll filamentous phage. The thus generated phage can propagate in E. coll 
and express peptides encoded by the oligonucleotides as fusion proteins on the surface of 
the phage. Scott and Smith, Science, 249:368-390 (1990). Alternatively, the "peptides 
on plasmids" method may also be used to form peptide libraries. In this method, random 
25 peptides may be fused to the C-terminus of the E. coll Lac repressor by recombinant 

technologies and expressed from a plasmid that also contains Lac repressor-binding sites. 
As a result, the peptide fusions bind to the same plasmid that encodes them. 

Small organic or inorganic non-peptide non-nucleotide compounds are preferred 
test compounds for the screening assays of the present invention. They too can be 
30 provided in a library format. See generally, Gordan et al J. Med. Chem., 37:1385-1401 
(1994). For example, benzodiazepine libraries are provided in Bunin and Ellman, J. Am. 
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Chem. Soc, 114: 10997-10998 (1992), which is incorporated herein by reference. A 
method for constructing and screening peptoid libraries are disclosed in Simon et al, 
Proc. Natl Acad Sci. USA, 89:9367-9371 (1992). Methods for the biosynthesis of novel 
polyketides in a library format are described in McDaniel et al, Science, 262:1546-1550 
5 (1993) and Kao et al, Science, 265:509-512 (1994). Various libraries of small organic 
molecules and methods of construction thereof are disclosed in U.S. Patent Nos. 
6,162,926 (multiply-substituted fullerene derivatives); 6,093,798 (hydroxamic acid 
derivatives); 5,962,337 (combinatorial l,4-benzodiazepin-2, 5-dione library); 5,877,278 
(Synthesis of N-substituted oligomers); 5,866,341 (compositions and methods for 

10 screening drug libraries); 5,792,821 (polymerizable cyclodextrin derivatives); 5,766,963 
(hydroxypropylamine library); and 5,698,685 (morpholino-subunit combinatorial 
library), all of which are incorporated herein by reference. 

Other compounds such as oligonucleotides and peptide nucleic acids (PNA), and 
analogs and derivatives thereof may also be screened to identify clinically useful 

15 compounds. Combinatorial libraries of oligonucleotides are also known in the art. See 
Gold et al, J. Biol Chem,, 270:13581-13584 (1995). 

Once an effective compound is identified, structural analogs or mimetics thereof 
can be produced based on rational drug design with the aim of improving drug efficacy 
and stability, and reducing side effects. Methods known in the art for rational drug 

20 design can be used in the present invention. See, e.g., Hodgson et al, Bio/Technology, 
9:19-21 (1991); U.S. Patent Nos. 5,800,998 and 5,891,628, all of which are incorporated 
herein by reference. An example of rational drug design is the development of HIV 
protease inhibitors. See Erickson et al, Science, 249:527-533 (1990). 

Preferably, structural information on the protein-protein interaction to be 

25 modulated is obtained. For example, each of the interacting pair can be expressed and 
purified. The purified interacting protein pairs are then allowed to interact with each 
other in vitro under appropriate conditions. Optionally, the interacting protein complex 
can be stabilized by crosslinking or other techniques. The interacting complex can be 
studied using various biophysics techniques including, e.g., X-ray crystallography, NMR, 

30 computer modeling, mass spectrometry, and the like. Likewise, structural information 


34 


Attorney Docket No. 1418.03 

can also be obtained from protein complexes formed by interacting proteins and a 
compound that initiates or stabilizes the interaction of the proteins. 

In addition, understanding of the interaction between the proteins of interest in the 
presence or absence of a modulating compound can also be derived from mutagenesis 
5 analysis using the above-described detection method of the present invention. Indeed, the 
detection method of this invention is particularly useful in analyzing and characterizing 
protein-protein interactions. In this respect, various mutations can be introduced into the 
interacting proteins and the effect of the mutations on protein-protein interaction is 
examined by the above-discussed detection method. 
10 Various mutations including amino acid substitutions, deletions and insertions can 

M be introduced into a protein sequence using conventional recombinant DNA 

jS- technologies. Generally, it is particularly desirable to decipher the protein binding sites. 
+• Thus, it is important that the mutations introduced only affect protein-protein interaction 
43" and cause minimal structural disturbances. Mutations are preferably designed based on 

J'. 15 knowledge of the three-dimensional structure of the interacting proteins. Preferably, 
1^ mutations are introduced to alter charged amino acids or hydrophobic amino acids 

M' exposed on the surface of the proteins, since ionic interactions and hydrophobic 

" J interactions are often involved in protein-protein interactions. Alternatively, the "alanine 
scanning mutagenesis" technique is used. See Wells, et al, Methods EnzymoL, 202:301- 
20 306 (1991); Bass et al, Proc. Natl Acad. Sci. USA, 88:4498-4502 (1991); Bennet et al, 
J. Biol Chem., 266:5191-5201 (1991); Diamond etal, 7. Virol, 68:863-876 (1994). 
Using this technique, charged or hydrophobic amino acid residues of the interacting 
proteins are replaced by alanine, and the effect on the interaction between the proteins is 
analyzed using the above-described detection method. For example, the entire protein 
25 sequence can be scanned in a window of five amino acids. When two or more charged or 
hydrophobic amino acids appear in a window, the charged or hydrophobic amino acids 
are changed to alanine using standard recombinant DNA techniques. The thus mutated 
proteins are used as "test proteins" in the above-described detection method to examine 
the effect of the mutations on protein-protein interaction. Preferably, the mutagenesis 
30 analysis is conducted both in the presence and in the absence of an identified modulating 
compound. In this manner, the domains or residues of the proteins important to protein- 
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protein interaction and/or the interaction between the modulating compound and the 
proteins can be identified. 

Based on the structural information obtained, structural relationships between the 
interacting proteins as well as between the identified compound and the interacting 
5 proteins are elucidated. The moieties and the three-dimensional structure of the 
identified compound, i.e., lead compound, critical to its modulating effect on the 
interaction of the known proteins of interest are revealed. Medicinal chemists can then 
design analog compounds having similar moieties and structures. 

In addition, an identified peptide compound capable of modulating particular 
10 protein-protein interactions can also be analyzed by the alanine scanning technique to 
U determine the domains or residues of the peptide important to its modulating effect on 

.■was 

'% particular protein-protein interactions. The peptide compound can be used as a lead 

HF molecule for rational design of small organic molecules. See Huber et al, Curr. Med. 

3 Chem., 1:13-34(1994). 

^ 15 The residues or domains critical to the modulating effect of the identified 

compound constitute the active region of the compound known as its "pharmacophore." 
; Once the pharmacophore has been elucidated, a structural model can be established by a 
modeling process that may incorporate data from NMR analysis, X-ray diffraction data, 
Q alanine scanning, spectroscopic techniques and the like. Various techniques including 

20 computational analysis, similarity mapping and the like can all be used in this modeling 
process. See e.g., Perry et al, in OSAR: Quantitative Structure-Activity Relationships in 
Drug Design, pp.189-193, Alan R. Liss, Inc., 1989; Rotivinen et al, Acta 
Pharmaceutical Fennica, 97:159-166 (1988); Lewis et al., Proc. R. Soc. Lond., 236:125- 
140 (1989); McKinaly et al,Annu. Rev. Pharmacol Toxiciol, 29:111-122 (1989). 

25 Commercial molecular modeling systems available from Polygen Corporation, Waltham, 
MA, include the CHARMm program, which performs the energy minimization and 
molecular dynamics functions, and QUANTA program which performs the construction, 
graphic modeling and analysis of molecular structure. Such programs allow interactive 
construction, visualization and modification of molecules. Other computer modeling 

30 programs are also available from BioDesign, Inc. (Pasadena, CA.), Hypercube, Inc. 
(Cambridge, Ontario), and Allelix, Inc. (Mississauga, Ontario, Canada). 
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A template can be formed based on the established model. Various compounds 
can then be designed by linking various chemical groups or moieties to the template. 
Various moieties of the template can also be replaced. In addition, in the case of a 
peptide lead compound, the peptide or mimetics thereof can be cyclized, e.g., by linking 
5 the N-terminus and C-terminus together, to increase its stability. These rationally 
designed compounds are further tested. In this manner, pharmacologically acceptable 
and stable compounds with improved efficacy and reduced side effect can be developed. 
The compounds identified in accordance with the present invention can be incorporated 
into a pharmaceutical formulation suitable for administration to an individual. 

10 As is apparent from the above description, the present invention provides a 

powerful, versatile, intein-based in vitro system for detecting and characterizing protein- 
protein interactions, and for selecting compounds capable of modulating protein-protein 
interactions. The system can be used with great convenience and can be easily adapted to 
high-throughput screening procedures. 

15 All publications and patent applications mentioned in the specification are 

indicative of the level of those skilled in the art to which this invention pertains. All 
publications and patent applications are herein incorporated by reference to the same 
extent as if each individual publication or patent application was specifically and 
individually indicated to be incorporated by reference. 

20 Although the foregoing invention has been described in some detail by way of 

illustration and example for purposes of clarity of understanding, it will be obvious that 
certain changes and modifications may be practiced within the scope of the appended 
claims. 

25 


37 


